Efficient language model adaptation through MDI estimation

نویسنده

  • Marcello Federico
چکیده

This paper presents a method for n-gram language model adaptation based on the principle of minimum discrimination information. A background language model is adapted to t constraints on its marginal distributions that are derived from new observed data. This work gives a di erent derivation of the model by Kneser et al. (1997) and extends its application to interpolated language models. The proposed method has been evaluated on an Italian 60K-word broadcast news task.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MDI adaptation for the lazy: avoiding normalization in LM adaptation for lecture translation

This paper provides a fast alternative to Minimum Discrimination Information-based language model adaptation for statistical machine translation. We provide an alternative to computing a normalization term that requires computing full model probabilities (including back-off probabilities) for all n-grams. Rather than re-estimating an entire language model, our Lazy MDI approach leverages a smoo...

متن کامل

Constraint selection for topic-based MDI adaptation of language models

This paper presents an unsupervised topic-based language model adaptation method which specializes the standard minimum information discrimination approach by identifying and combining topic-specific features. By acquiring a topic terminology from a thematically coherent corpus, language model adaptation is restrained to the sole probability re-estimation of n-grams ending with some topic-speci...

متن کامل

Topic Adaptation for Lecture Translation through Bilingual Latent Semantic Models

This work presents a simplified approach to bilingual topic modeling for language model adaptation by combining text in the source and target language into very short documents and performing Probabilistic Latent Semantic Analysis (PLSA) during model training. During inference, documents containing only the source language can be used to infer a full topic-word distribution on all words in the ...

متن کامل

Dynamic language modeling for broadcast news

This paper describes some recent experiments on unsupervised language model adaptation for transcription of broadcast news data. In previous work, a framework for automatically selecting adaptation data using information retrieval techniques was proposed. This work extends the method and presents experimental results with unsupervised language model adaptation. Three primary aspects are conside...

متن کامل

Crosslingual tandem-SGMM: exploiting out-of-language data for acoustic model and feature level adaptation

Recent studies have shown that speech recognizers may benefit from data in languages other than the target language through efficient acoustic modelor feature-level adaptation. Crosslingual Tandem-Subspace Gaussian Mixture Models (SGMM) are successfully able to combine acoustic modeland featurelevel adaptation techniques. More specifically, we focus on under-resourced languages (Afrikaans in ou...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999